Skip to content

Conversation

@harshagr70
Copy link
Contributor

@harshagr70 harshagr70 commented Aug 21, 2025

This PR introduces an optional flat-file bypass for get.trait.data(), allowing trait and prior data to be loaded directly from a CSV specified via <file_path> in <pfts> within settings.xml. When a file path is provided, the function reads PFTs, priors, and trait names from the flat file instead of querying BETYdb, enabling DB-independent runs (useful for offline workflows, testing, and CI). If no file path is specified, the original database workflow remains unchanged.

fixes a part of #3602

@github-actions github-actions bot added the Base label Aug 21, 2025
Copy link
Member

@dlebauer dlebauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this important step in the decoupling process!

A few requests

  • Could you please update the docs
  • Add a test that uses the file option

Questions:

  • Is there a reason not to have an argument to specify the location of the file? It seems having an input_file argument to would make the option more clear and straightforward to use.
  • Will this require updating query_pfts and query_priors so that they can also take a file?

@infotroph
Copy link
Member

infotroph commented Aug 26, 2025

Is there a reason not to have an argument to specify the location of the file?

I'm not opposed and I agree it'd make this easier to call directly, but am comfortable leaving it for a separate PR (and thus also punting on changes in query_pfts and query_priors) because the existing trait workflow already assumes each step modifies a settings object.

+1 to all the other suggestions

@harshagr70
Copy link
Contributor Author

Yes , it would be good if we keep that to next separate PR , or i can just add a commit in this PR to make it pass as an argument .
What you suggest now @infotroph ?

@harshagr70
Copy link
Contributor Author

@dlebauer , we don't need to specifically change the functions : query_pfts and query_priors , as we completely bypass these functions when using the flatfile , just filter data and pass the data frame to the downstream function get.trait.data.pft

Yes i will update the docs and write a test as well !

I have added the support to pass the file_path as in argument directly meanwhile keeping the logic of passing the path in the settings intact as well .

@dlebauer
Copy link
Member

@harshagr70 re: "we completely bypass these functions [query_pfts and query_priors] when using the flatfile" - perhaps out of scope for this PR, but where will the [pft,variable,prior] table used in the meta-analysis come from?

@infotroph
Copy link
Member

where will the [pft,variable,prior] table used in the meta-analysis come from?

@dlebauer It will be a CSV file specified by the user as settings$pft$file_path; I propose we try that out a bit and refine if that turns out to be a hassle.

@harshagr70 when we discussed this live we realized it could use a line in the documentation -- can you add that, please?

Copy link
Member

@infotroph infotroph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this is now ready to go.

@dlebauer have your requests all been addressed? I'm inclined to skip adding tests because the existing test coverage targets get.trait.data.pft rather than get.trait.data and is therefore not affected by this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants